Molecular Psychiatry (2014) 19, 668-675 
© 2014 Macmillan Publishers Limited All rights reserved 1359-4184/14 



OPEN 



www.nature.com/mp 

ORIGINAL ARTICLE 

708 Common and 2010 rare DISCI locus variants identified 
in 1542 subjects: analysis for association with psychiatric 
disorder and cognitive traits 

PA Thomson 1 ' 2 ' 9 , JS Parla 3 ' 9 , AF McRae 4 ' 9 , M Kramer 3 , K Ramakrishnan 1 , J Yao 3 , DC Soares 1 , S McCarthy 3 , SW Morris 1 , L Cardone 3 , 
S Cass 1 , E Ghiban 3 , W Hennah 1 ' 5 , KL Evans 1 ' 2 , D Rebolini 3 , JK Millar 1 , SE Harris 1 ' 2 , JM Starr 2 , DJ Maclntyre 6 , Generation Scotland 7 , 
AM Mcintosh 6 , JD Watson 3 , IJ Deary 2 , PM Visscher 4 ' 8 , DH Blackwood 6 , WR McCombie 3 and DJ Porteous 1 ' 2 

A balanced t(1;1 1) translocation that transects the Disrupted in schizophrenia 7 (DISC1) gene shows genome-wide significant linkage 
for schizophrenia and recurrent major depressive disorder (rMDD) in a single large Scottish family, but genome-wide and exome 
sequencing-based association studies have not supported a role for DISCI in psychiatric illness. To explore DISCI in more detail, we 
sequenced 528 kb of the DISCI locus in 653 cases and 889 controls. We report 2718 validated single-nucleotide polymorphisms 
(SNPs) of which 2010 have a minor allele frequency of < 1%. Only 38% of these variants are reported in the 1000 Genomes Project 
European subset. This suggests that many DISCI SNPs remain undiscovered and are essentially private. Rare coding variants 
identified exclusively in patients were found in likely functional protein domains. Significant region-wide association was observed 
between rsl 68561 99 and rMDD (P = 0.026, unadjusted P = 6.3 x 10~ 5 , OR = 3.48). This was not replicated in additional recurrent 
major depression samples (replication P = 0.11). Combined analysis of both the original and replication set supported the original 
association (P = 0.0058, OR = 1.46). Evidence for segregation of this variant with disease in families was limited to those of rMDD 
individuals referred from primary care. Burden analysis for coding and non-coding variants gave nominal associations with 
diagnosis and measures of mood and cognition. Together, these observations are likely to generalise to other candidate genes 
for major mental illness and may thus provide guidelines for the design of future studies. 
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INTRODUCTION 

Schizophrenia (SZ), bipolar disorder (BD) and recurrent major 
depressive disorder (rMDD) are common forms of serious mental 
illness, each with a strong and overlapping genetic component. 1-3 
Genome-wide linkage, association, cytogenetic, copy number 
variant and, more recently, sequencing studies establish that 
the genetic architecture of psychiatric illness is complex and that 
there is extensive genetic heterogeneity, which is incompletely 
defined or understood (reviewed in Sullivan et a/. 4 ). We previously 
reported a t(1;11) translocation in a single large Scottish family 
that showed genome-wide significant linkage for SZ, rMDD and 
jointly with BD. 5 The t(1;11) translocation is balanced and 
structurally simple, but the outcome is genetically complex, 
disrupting the protein coding gene Disrupted in schizophrenia 
7 (DISCI), the antisense non-coding gene Disrupted in 
schizophrenia 2 (DISC2) and the non-coding gene DISCI FP1, 
creating a DISC1/D/5C7FP7 fusion transcript. 6 " 11 

Several small independent studies have reported evidence for 
association of single D/5C7 single-nucleotide polymorphisms 



(SNPs) (coding and non-coding) or haplotypes with SZ, BD, rMDD 
and other neuropsychiatric traits, including autism spectrum 
disorder, cognition, normative cognitive ageing, anxiety and 
structural and functional brain imaging phenotypes. 9 ' 12,13 Rare 
amino-acid substitution variants in DISC1 have been reported in 
cases of SZ, 10 ' 11 BD, 14 rMDD, 13 autism spectrum disorder 15 and 
agenesis of the corpus callosum, 16 as has an increased burden 
of rare missense variants in exon 1 1 of D/5C7 for schizoaffective 
disorder, 17 and for DISC1 pathway genes in SZ. 10 In contrast, 
a meta-analysis of all known common variants within the DISC 
locus, from a total of 11 626 cases and 15 237 controls that 
involved the testing of 1241 SNPs, found no evidence that 
common variants at the DISC locus are significantly associated 
with SZ. 18 Moreover, the D/SC7 locus has not reached genome- 
wide significance in large-scale meta-analyses of linkage studies of 
SZ, 19 nor have its common variants in large-scale genome-wide 
association studies of SZ, BD or rMDD 20 " 22 A recent exon-based 
study that sequenced 2.7 kb of D/SC7 in 727 cases of SZ and 733 
controls found 32 rare alleles (minor allele frequency (MAF)<0.01) 
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in SZ cases and 40 in European controls with no evidence for a 
significantly increased burden of likely pathogenic variants. 23 
DISCI, however, continues to feature strongly in attempts to 
assess genome-wide association results in terms of networks 24 or 
in combination with known biological function 25-27 

The biological functions of DISCI fit well with current 
aetiological concepts in SZ-related major mental illness 
and cognition 28 DISC1 is a scaffold protein that interacts with, 
and modulates the activity of, multiple proteins with key roles in 
neurodevelopment, neurogenesis, neuronal migration, integration 
and signalling 29-31 including the antidepressant and antipsychotic 
targets GSK3(3 32 and PDE4. 33 Several common and rare amino-acid 
polymorphisms of DISCI have predicted deleterious effects 
on protein function and demonstrable biological effects 
in experimental settings. 31 The 704C allele is associated with 
reduced activity of ERK1 and Akt kinases, altered binding affinities 
of DISC1 for NDE1 and NDEL1 and variation in DISC1 oligomeric 
status. 34-37 The 607F allele results in (a) reduced binding and 
centrosomal localisation of PCM1, (b) reduced noradrenaline 
neurotransmitter release in SH-SY5Y cells, 38 (c) altered 
mitochondrial trafficking 39 and (d) a partial shift from neuronal 
to glial expression in the brain. 40 Furthermore, Singh et al. 4 ^ 
reported that 607F impacts negatively on neural progenitor 
proliferation in E16 mouse brain, correlates with aberrant wnt 
signalling in human lymphoblasts, and is associated with a 
neurodevelopmental phenotype in morpholino mutant zebrafish. 
R37W lies within an arginine-rich nuclear localisation motif 
and a partially overlapping interaction domain for PDE4 42 
and GSK3P 41,43 The 37W allele shows reduced nuclear 
DISC1 expression and altered DISC1 regulation of ATF4, a critical 
modulator of cAMP signalling and mediator of the stress 
response. 44 

In summary, whereas the primary genetic evidence from the 
original Scottish t(1;1 1) family was significant at the genome-wide 
significance level for both SZ and rMDD and the experimental 
evidence and biological plausibility of DISC1 remains strong, the 
evidence from subsequent linkage and candidate gene associa- 
tion studies is however inconsistent and not supported by 
genome-wide association studies or meta-analysis. 12 To explore 
these contrasting findings, we aimed here to establish the nature 
and frequency of DISCI genomic sequence variants, identify 
rare variants in putative functional domains, and test for effects 
of these on cognitive traits and the risk of psychiatric illness. 
We comprehensively sequenced 528 kb covering the entire DISCI 
locus, including TRAX (also known as TSNAX) for which there is 
evidence for intergenic splicing with DISCI 45 and the intergenic 
region, which contains regulatory elements immediately 5' of 

D/so 1 3,46-48 



Non-coding variants were annotated using the UCSC table browser for 
the following tracks: 'RepeatMasker', 51 'CpG island', TFBS conserved', 7x 
Reg Potential' (which substantially overlaps with DNAse hypersensitivity 
sites) and/or '28-Way Most conserved — PlacMammal' (http://genome.ucsc 
.edu/). Sequence variants classified as coding were mapped to the DISCI L 
isoform and potential pathogenicity ascribed using Pmut, 52 Panther 53 and 
PolyPhen-2. 54 The coding sequence variants were mapped onto 
a list of known curated DISCI -interactor binding sites 31 and with other 
functional elements (for example, phosphorylation sites 55 ). Case-control 
association was tested on the combined case samples as well as 
individually for SZ, BP and rMDD using Fisher's exact test. Permutation 
was used to derive region-wide P-values and significance thresholds. 
Quantitative trait association analyses using LBC1936 samples were 
performed by linear regression of the trait residuals (adjusted for age 
and sex) on the number of minor alleles at each SNP, with empirical 
P-values estimated by permutation to avoid issues with the test statistic 
distribution caused by the combination of rare variants and slight 
deviations from normality in the phenotypes. All association analyses 
were performed using PLINK. 56 Mark-recapture analysis followed the 
Lincoln-Petersen and Modified Petersen methods 57,58 with 95% confidence 
intervals calculated following Chapman 59 Burden analysis was performed 
in PLINK/SEQ to implement BURDEN and VTTEST with empirical P-values 
estimated using permutation. Genotyping of the replication and familial 
samples was performed by the Edinburgh Wellcome Trust Clinical 
Research Facility Genetics Core using TaqMan SNP genotyping assay 
C 33950433_10 with concurrent genotyping of known heterozygotes. 

Data access 

The accession numbers for sequence data are NCBI ss472328925— 
SS472331023. 

RESULTS 

Sequence analysis 

We sequenced 1542 Caucasians from Scotland comprising 
240 cases of SZ, 221 cases of BD, 192 cases of rMDD and, as 
controls, 889 members of the Lothian Birth Cohort 1936 
(LBC1936), which have been extensively phenotyped. 60 Each 
sample was sequenced to >80% coverage at a minimum of 30- 
fold read depth by long-range PCR and sequencing on either 
lllumina GAM or HiSeq 2000 sequencers. To ensure a robust 
data set, all variants within repetitive regions were removed. 
Final quality score thresholds for the data were derived from 
capillary sequence validation of 10% of the remaining variants. All 
variants with an MAF <1% were validated by ABI3730 
sequencing. After quality control, there was no evidence for 
sequencing bias between cases and controls (Supplementary 
Figure S1). Allele frequencies from our sample showed strong 
concordance to those from the European subset of the 1000 
Genomes Project 61 (Supplementary Figure S2). We report 2718 
SNPs in the 1542 samples analysed, 708 at ^1% and 2010 at 
< 1 % MAF (Supplementary Table S1 ). Only 1 027 of the 271 8 SNPs 
(38%) were previously reported in the European subset of the 
1 000 Genomes Project. 61 

As defined and annotated by the UCSC genome browser (http:// 
genome.ucsc.edu), 489 SNPs mapped to regions of regulatory 
potential, 177 to non-coding exons (including DISC2) and 36 to 
coding regions of exons. Of these 36 variants, 12 were 
synonymous changes, 23 were non-synonymous changes, with 
one producing a stop codon consistent with the DISC1 Es isoform 
(Figure 1; Supplementary Figure S3; Supplementary Tables S1 and 
S2). Supplementary Table S3 summarises the overlap between 
variants identified in this study and other DISC1 sequencing 
studies and relevant association studies. 10 ' 11 ' 13 ' 14 ' 16 ' 17 ' 23 ' 46 



MATERIALS AND METHODS 

A full summary of the methods can be found in Supplementary 
Information. Briefly, all study participants gave signed consent for their 
data and samples to be used in studies that have been approved by the 
appropriate Research Ethics Committee or the GS access Committee. 
Genomic DNA from each individual was whole genome amplified 
in triplicate, the products pooled and amplified with primer pairs tiled 
across 528 kb of TRAX/DISC1 (hg18 chrl :229723339-230251606; hg19 
chr1:231 65671 6-2321 84983). For each sample, the pooled products were 
sheared, converted into paired-end lllumina libraries and sequenced on an 
lllumina GAM or HiSeq 2000 sequencer to >80% coverage and > 30-fold 
depth. Sequences were aligned to the UCSC hg18 reference sequence, 
variants called using MAQ software 49 and the variants in repeats removed. 
Ten percent of all remaining variants were validated using Sanger 
sequencing chemistry on an ABI3730 sequencer, and the derived 
information used to optimise the quality control filters. After quality 
control screening, all exonic and low frequency (MAF<1%) variants were 
also validated by Sanger chemistry sequencing as above. The variants were 
functionally annotated using SNPnexus 50 (http://www.snp-nexus.org). 



Association and segregation of common variants with psychiatric 
illness and related quantitative traits 

Genome-wide association studies of SZ, BP and rMDD are most 
consistent with a polygenic liability for common variants, but they 
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Figure 1. TRAX/DISC1 {Disrupted in schizophrenia I) genomic and exon structure: alignment of coding and regulatory variants, (a) Three-period 
moving average of all single-nucleotide polymorphisms (SNPs) identified per 5kb across the region in this study with TRAX/DISC1 intron/ 
exons structure given to scale. Total SNP number (black), those with a minor allele frequency (MAF) of <1% SNPs (blue), those ^1% MAF 
(green), rs1 68561 99 (arrow). For comparison, the number of SNPs identified in the 1000 genomes (red, dashed) and the number of bases 
repeat masked (top black) and 7x regulatory potential (top blue) are also shown. Exon and intron structure of TRAX and DISC1 are drawn to 
scale, (b) The position and diagnoses of exonic or regulatory SNPs. SNPs not seen previously (underlined), synonymous SNPs (black) and 
non-synonymous SNPs (red), stop or putative splice SNPs (green). Novel SNPs not previously reported in the European samples of the 
1000 Genomes Project (v3.20101 123) or the NHLBI GO Exome Sequencing Project (ESP6500) or relevant sequencing and association 
studies 10 ' 16 ' 17 ' 64 ' 65 are underlined. 



also imply that there is real 'missing' genetic variation, which is 
most likely due to risk variants having low frequency in the 
population. To test for evidence of DISCI association, we applied 
the Fisher's exact test across all variants and all diagnoses 
(Figure 2). There was no evidence for SNP association at genome- 
wide levels of significance for any diagnosis when considered 
separately or combined, nor was there evidence for locus-wide 
association of variants with SZ or BP. We did detect a novel, 
locus-wide empirical association P = 0.026 (OR = 3.48, 95% Cl = 
1.95-6.23, unadjusted P=6.3x10 -5 ) between intronic variant 
rs1 68561 99 and rMDD. We speculated that individual risk alleles 
might be predicted to segregate with disease in families. Twelve 
additional family members were available for genotyping for four 
rs1 68561 99 carriers. The rs1 68561 99 risk allele segregated with 
rMDD in all four families (Supplementary Figure S4). 

We next tested for association of rs1685199 with depression in 
three additional sample sets: a group of individuals referred from 
primary care to a hospital outpatient clinic (n = 467 rMDD 
patients), and two population-based samples drawn from primary 
care as part of the Generation Scotland: Scottish Family Health 
Study consisting of 645 cases with rMDD and 690 cases with single 
episode MDD. All three groups were compared with 4017 controls 
drawn exclusively from Generation Scotland: Scottish Family 
Health Study (Supplementary additional text and Table S4). No 
significant association was seen with any individual replication set 
or all three combined (best P = 0.088). Analysis of all three rMDD 
sets, both the original set and the two rMDD replication sets, was 
supportive of association (1112 rMDD, 4017 controls; P = 0.0058, 
OR =1.46, 95%CI = 1.12-1.91). Combined analysis of both sets of 
individuals referred from primary care showed stronger nominal 
association (P = 0.00065, OR = 1.76, 95%CI = 1 .27-2.44). No asso- 
ciation was seen in the combined analysis of both the rMDD and 



MDD population-based replication sets (P = 0.41). The risk allele 
for rs1 68561 99 did not segregate with rMDD in 10 families of 
carriers identified from the Generation Scotland replication 
sample (Supplementary Figure S5). This suggests that there is 
increased evidence for association of rs1 68561 99 in the more 
severely affected individuals. 

SNP rsl 68561 99 is on the Affymetrix 6.0 array, but the best 
tagging SNP on the lllumina 660W-Quad, Human Hap, HumanlM- 
Duo arrays is rsl 68561 89. SNP rsl 68561 89 has an r 2 of 0.27 
with rsl 68561 99, which may explain in part why this association 
has not been reported previously in genome-wide association 
studies. 22 ' 62 ' 63 SNP rs6678723, which lies 2.1 kb distal to this SNP 
within intron 11, showed the most significant association of DISCI 
in the recent mega-analysis of depression (P = 0.0092). 20 

The LBC1936 has quantitative measures of symptoms of 
anxiety, depression and the personality trait of neuroticism, plus 
psychometrically tested measures of cognitive ability (fluid 
(age sensitive) and crystallised (non-age sensitive)) and cognitive 
ageing, 60,64 which have been shown to be highly heritable and 
polygenic. 65,66 Association of these traits with DISCI was tested by 
linear regression analyses, co-varied for age at testing and sex 
(Supplementary Figures S6-S8). There were no region-wide 
significant findings for any of these quantitative traits. 



Estimating the net pool of DISCI variants 

To estimate the effective pool of common and rare sequence 
variants in the European population, we applied a 'mark-recapture' 
approach (see Supplementary methods) to our data and 
that of the 1000 Genomes Project (v3.20101 123) 67 after 
appropriate checks on read depth and Sanger sequence 
validation (Supplementary Table S5; Supplementary Figure 9). 
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Figure 2. Region-wide association analysis for schizophrenia, bipolar and recurrent major depressive disorder. Nominal P-values for Fisher's 
exact tests are plotted against genomic location (hg18) across the TRAX/DISC1 (Disrupted in schizophrenia 1) locus. Reference lines represent 1% 
(dashed) and 5% (solid) region-wide empirical thresholds. Only the association of rs1 68561 99 and recurrent major depressive disorder remains 
significant at the 5% threshold (arrow). 



The total number of DISCI SNPs ^1% MAF was estimated at 905 
(95%CI = 905 ± 5), of which 901 (99.5%) are known (Supple- 
mentary Table S6). The number of rare variants (< 1% MAF) is less 
confidently predicted, but is likely to be substantially higher 
(95%CI 3777 ± 252) (Supplementary Table S6). Thus, despite the 
~2500 European genomes in which the DISCI locus has been 
completely sequenced and the 2305 rare DISCI variants now 
known, ~40% or more remain to be discovered, and will be 
essentially 'private'. 



Rare amino-acid substitutions 

Of the 17 rare coding SNPs previously reported for 
D/5C7/ i o,i 6,i 7,68,69 we identified 12 (70.6%) plus an additional 

8 non-synonymous variants, of which 5 are also absent from 
the European samples of both the 1000 Genomes Project 
(v3.20101123) 61 and the Exome Variant Server (NHLBI GO Exome 
Sequencing Project (ESP), Seattle, WA; http://evs.gs.wash ington. 
edu/EVS/; September, 2012), and previous DISCI sequencing 
studies. 10 ' 11 ' 13 ' 14 ' 16 ' 17 ' 23 ' 46 ' 70 (Figure 1; Supplementary Figure S3; 
Supplementary Tables S2 and S3). 

Five variants, R37W, T453M, T603M, L607F and S704C, are 
predicted to be deleterious by all three applied prediction 
algorithms (PolyPhen-2, Pmut and Panther; Supplementary 
Table S2). From this set, the functional effects of the common 
variants L607F (rs6675281) and S704C (rs821616) have been well 
documented in the literature, as mentioned earlier. R37W lies 



within a defined nuclear localisation signal 71 and PDE4B binding 
site 72 and is seen in a single case of rMDD (discussed at the end of 
this section). T453M is present at low frequency in cases and 
control individuals, both in this study and others (Supplementary 
Table S3). T603M was only identified in a single control, but 
Song et al. u reported a T603I variant in a schizophrenic individual 
that was absent in their set of control individuals (Supplementary 
Table S3). 

Five variants were only observed in affected individuals from 
our study (R37W, A83V, W160L, R233K and R418H), but not in 889 
control individuals. Four of these non-synonymous case-only 
singletons are located in the largest coding exon, DISCI exon 2, 
and the remaining variant is in DISCI exon 4 (Figure 1). A83V was 
seen in a single individual with BD; this variant is predicted to be 
deleterious by PolyPhen-2 and Pmut and has been shown to affect 
wnt signalling 41 It was however observed at low frequency in 
controls and in individuals with partial agenesis of the corpus 
callosum in previous studies (Supplementary Table S3). 11 ' 14 ' 16 
Apart from R37W and A83V, none of the three remaining non- 
synonymous variants, W160L, R233K and R418H, are consistently 
predicted by the three prediction algorithms to have functional 
effects (Supplementary Table S2). 160L and 41 8H were detected in 
single SCZ individuals and have also previously been reported 
in individuals with SCZ; but 160L has also been detected in 
control individuals. 10 The variant 233K has not been previously 
reported, and was identified in an individual with rMDD. No non- 
synonymous variants in TRAX were found in cases only. A single 
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stop mutation was identified in a control individual and produces 
an alternative stop site for the DISC1 Es isoform. These variants 
can now be tested for potential impact on DISC1 biophysical 
properties, protein interaction and biological function. 29-31 

Evidence for familial segregation was sought for five rare exonic 
variants where additional family members were available, but 
none segregated perfectly or unequivocally with diagnosis 
(Supplementary Figure 9). Of note however was the identification 
of the non-synonymous amino-acid variant R37W (rs 137948488), 
first reported 68 in a subject with SZ, and seen here in a single case 
of rMDD. R37 is strictly conserved among orthologues and recent 
publications, including our own, have demonstrated biological 
effects of 37W on DISC1 interactions, 32,73 and shown a dominant- 
negative effect on the sub-cellular distribution of DISC1. 44 Five 
additional family members of the 37W carrier were available for 
genotyping diagnosed with rMMD, generalised anxiety disorder, 
bipolar II or no psychiatric diagnosis at the time of assessment. 
The R37W mutation was present in relatives with rMDD and 
generalised anxiety disorder, but not in a relative with bipolar II, or 
any unaffected individual (Figure 3). 



Burden analysis for putative functional variants 
To explore the burden of SNPs of potential functional significance, 
all variants with MAF <1% were first validated by ABI3730 
sequencing. There was no significant overall difference in the 
number of singleton variants (Supplementary Figure S10) or in the 
overall number of minor alleles by diagnosis (see Supplementary 
Methods). SNPs were classified on the basis of bioinformatic 
annotation into seven functional classes: those in exons including 
untranslated exons, coding sequence, non-synonymous coding 
SNPs, conserved regions, regions with regulatory potential, 
conserved transcription factor binding sites and CpG islands 
(see Materials and methods and Supplementary Table S7). 
The empirical P-values for the burden analysis were obtained by 
permutation correcting for the multiple thresholds tested, but not 
for the multiple functional subgroups or diagnostic classes 
(SZ, BP and rMDD and all cases combined), therefore all results 
are reported as nominal significance values (Supplementary Table 
S8; Supplementary Methods). Details of nominally significant 
results are given in Table 1. For rMDD only, there was a nominally 
significant (P = 0.044) excess of minor alleles for SNPs with 
regulatory potential across all frequencies, and for rare SNPs in 
conserved regions with MAF ^0.18% (P= 0.022). Nominally 
significant association was found in the LBC1936 data between 
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Figure 3. Segregation of the R37W polymorphism with psychiatric 
diagnoses in a small Scottish family. The proband of the family is 
indicated (arrow). The codon containing the T allele encodes for the 
amino acid tryptophan (W) and the codon containing the A allele 
encodes arginine (R). rMDD, recurrent major depressive disorder; 
BP2, bipolar II. 



the burden of minor alleles across all frequencies for SNPs in 
conserved transcription factor binding sites and increased 
symptoms of depression, a measure of depressed mood at the 
time of testing (Table 1). In addition, a nominally significant 
increase in burden of minor alleles for SNPs in CpG islands or 
coding SNPs was observed with Moray House Test scores, 
measures of cognitive ability (Supplementary Table S9). Summa- 
ries of the nominally significant results are given in Table 1. 



DISCUSSION 

Diagnoses of SZ, BP and rMDD were all present in the original 
Scottish family carrying the translocation that disrupts DISCI. All 
three DSMIV diagnoses have a strong and overlapping genetic 
component, but robust statistical analysis of gene-level contribu- 
tions to risk are complicated by extensive genetic heterogeneity 
within and between diagnoses. 4 We have provided the most 
comprehensive landscape of genetic variation at the DISCI locus 
to date in patients with this spectrum of psychiatric illness and in 
healthy population controls with quantitative measures of mood 
and cognition. Comparison between our sequencing study and 
that of the 1000 Genomes Project confirms that current genome- 
wide association studies effectively captures the majority of 
common (but not rare) variants in the European population. Our 
sample size is large by current sequencing study standards, but 
we lack power to detect genome-wide significant P-values for 
either common or rare variants (see Supplementary Information 
and Supplementary Figure S1 1 for further details and also Kiezun 
et al.). 74 Indeed, the predicted abundance of independent rare 
variants at this (and any other given) locus makes it highly 
improbable that any one will contribute to illness in the population 
at a frequency that will be statistically significant, given the numbers 
of patients we can afford to analyse by direct sequencing. 74 

We observed no evidence for association at the whole-genome 
level of statistical significance between individual rare or 
common variants and either psychiatric illness or cognition. This 
is consistent with recent findings, 74 which suggest that much 
larger samples would be required to detect such associations. 
Burden analysis of multiple rare and/or deleterious putative 
functional variants also failed to show association with these traits. 
We do report both functional and putative regulatory variants that 
are both individually, and by functional classification, nominally 
associated with rMDD and/or cognitive ability at the locus-wide 
level of significance. 

Our study identified a novel association between intronic SNP 
rs1 68561 99 and rMDD in hospital-referral subjects. Segregation 
with diagnosis in the relatives of these probands corroborated the 
association, but further studies are required to understand the lack 
of replication in population-based cohorts with depression. This 
may be due to inherent differences between patients recruited 
from hospital-referral compared with those from population- 
based cohorts. Cohorts from primary care are more likely to have a 
family history of depression, 75 and may have more physical and 
psychiatric comorbidity in general. Conversely, the population- 
based sample may have shorter, less severe episodes 76 than the 
hospital-based cohorts. 77 However, given the modestly significant 
P-value for rMDD in the discovery cohort, the number 
of psychiatric traits examined and the lack of replication, it is 
possible that the observed association is due to chance. 

The nominal associations of the burden of common (threshold 
MAF = 30.7%) and rare potentially regulatory variants (threshold 
MAF = 0.060%) to measures of cognitive ability merit further 
study. A yet-to-be-defined subset of these is likely to have critical 
roles is spatial and temporal regulation of transcription and 
splicing 46,78 This highlights the need for annotation tools with 
improved predictive value for non-coding variants. 79 

More importantly, our study demonstrates that substantial 
coding and non-coding genetic variation at the DISCI locus 
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Table 1. Summary of nominally significant burden results 


Case-control Trait 


SNP 
subset 


Test 


Average excess of 
variants in cases 


Optimised MAF 
threshold (N. SNP) 


P-value 


rMDD 
rMDD 


RegPot 
PhastCon 


BURDEN 
VTTEST 


1.39 
1.20 


0.0018 (31) 


0.044 
0.022 


Quantitative Trait 
traits 


SNP 
subset 


Test 


Effect size (beta) 


Optimised MAF 
threshold (N. SNP) 


P-value 


Symptoms of depression 3 

Moray House Test at age 70 b 

Moray House Test at age 70 adjusted for the 

Moray House Test score at age 1 1 b 

Moray House Test at age 1 1 b 

Moray House Test at age 1 1 b 


TFBS 

PhastCon 

CpG 

Coding 
CpG 


BURDEN 

VTTEST 

VTTEST 

VTTEST 
VTTEST 


0.062 

- 0.074 

- 0.064 

-0.101 

- 0.097 


0.0013 (21) 
0.00060 (2) 

0.00070 (12) 
0.31(3) 


0.032 
0.030 
0.047 

0.028 
0.040 


Abbreviations: MAF, minor allele frequency; rMDD, recurrent major depressive disorder; SNPs, single-nucleotide polymorphisms. 

See main text for trait descriptions and Supplementary Table S8 for results on all diagnoses and traits. Quantitative trait analysis was one-tailed under the 
hypothesis that an increased burden of minor alleles would reduce scores for cognitive traits and increase scores for anxiety, depression and neuroticism. 
References for all quantitative traits are available in the open-access Lothian Birth Cohort protocol paper (Deary ef al. 60 ). Coding: SNPs in protein-coding 
regions of exons including both synonymous and non-synonymous SNPs. PCon, SNPs within regions of conservation in placental mammals (PhastCon, UCSC). 
RegPot: SNPs with putative regulatory potential (UCSC, 7 x regulatory potential). CpG: SNPs in CpG islands commonly associated with promoter regions. 
a Tested using the Hospital Anxiety Depression Scales. b The Moray House Test is the general cognitive test — mostly a verbal reasoning, IQ-type test — that was 
used in the Scottish Mental Survey 1947. 



remains undiscovered. Despite sequencing over 1500 subjects, 
we have probably captured only ~40% of the extant D/5C7 
variants in just the European population. Crowley et al. 23 
sequenced 2.7 kb of D/5C7 exons and 5' and 3' regulatory 
sequence in 1460 samples of European or African origin. 
We observed 13/38 (34%) of the variants genotyped in the 
replication phase, supporting the argument for an abundance 
of rare variants. 

The level of sequence variation identified in our study is 
unlikely to be exceptional, and indeed is consistent with evidence 
emerging from other genome sequencing studies. 74 
Consequently, it will be challenging to demonstrate robust 
(replicated) association by statistical evidence alone in case- 
control studies, exceptionally so with the numbers of patients that 
are currently affordable for sequencing. The original t(1;1 1) family 
illustrates the added issue of variable penetrance and cross- 
boundary diagnosis for a given mutation: ~70% of carriers had 
SZ, BP or rMDD, but ~30% had no formal psychiatric diagnosis, 
yet t(1;11) carriers, including both affected and unaffected, had 
ERP P300 measures in the range typical of individuals with SZ. 5 
The original identification of 37W in a case of SZ and here in a case 
of rMDD (and two offspring, one with rMDD, the other generalised 
anxiety disorder) may suggest variable penetrance of this 
biologically functional variant. 44 Of note, the 37W variant was 
not observed in 10 000 control individuals, 11 the 1000 Genomes 
project, 61 the NHLBI GO Exome Sequencing Project (ESP), nor any 
of our 889 control samples. These findings on R37W reinforce the 
probable importance of this domain for DISC1 subcellular 
distribution and binding of interacting proteins 31,44 and add to 
the weight of evidence for other functional studies of DISC1 
amino-acid substitutions 41 Each observed amino-acid substitution 
provides a similar opportunity to tease out the relationships 
between genotype and phenotype and between structure and 
function. 29,31 Overall, these results demonstrate a high level 
of sequence variation in DISCI, a subset of which may contribute 
to psychiatric disorder in some individuals who will be typically 
rare in the population precluding classical statistical analysis and 
requiring biological validation. This predicts a population-specific 
contribution of rare casual variants to risk. 80 Our results indicate 
the potential value of sequencing non-coding regions of the 
genome, which may harbour disease-associated regulatory 



variants. Our findings of both functional and putative regulatory 
variants nominally associated with depression and cognitive 
ability merit replication in independent samples and biological 
exploration. 
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